Video-Audio Domain Generalization via Confounder Disentanglement

نویسندگان

چکیده

Existing video-audio understanding models are trained and evaluated in an intra-domain setting, facing performance degeneration real-world applications where multiple domains distribution shifts naturally exist. The key to domain generalization (VADG) lies alleviating spurious correlations over multi-modal features. To achieve this goal, we resort causal theory attribute such correlation confounders affecting both features labels. We propose a DeVADG framework that conducts uni-modal cross-modal deconfounding through back-door adjustment. performs disentanglement obtains fine-grained at class-level domain-level using half-sibling regression unpaired transformation, which essentially identifies domain-variant factors class-shared cause between false promote VADG research, collect VADG-Action dataset for action recognition with 5,000 video clips across four (e.g., cartoon game) ten classes cooking riding). conduct extensive experiments, i.e., multi-source DG, single-source qualitative analysis, validating the rationality of our analysis effectiveness framework.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Domain Generalization via Invariant Feature Representation

This paper investigates domain generalization: How to take knowledge acquired from an arbitrary number of related domains and apply it to previously unseen domains? We propose Domain-Invariant Component Analysis (DICA), a kernel-based optimization algorithm that learns an invariant transformation by minimizing the dissimilarity across domains, whilst preserving the functional relationship betwe...

متن کامل

Finite-time disentanglement via spontaneous emission.

We show that under the influence of pure vacuum noise two entangled qubits become completely disentangled in a finite-time, and in a specific example we find the time to be given by ln((2+sqrt[2] / 2) times the usual spontaneous lifetime.

متن کامل

Confounder selection via penalized credible regions.

When estimating the effect of an exposure or treatment on an outcome it is important to select the proper subset of confounding variables to include in the model. Including too many covariates increases mean square error on the effect of interest while not including confounding variables biases the exposure effect estimate. We propose a decision-theoretic approach to confounder selection and ef...

متن کامل

Video Abstraction in H.264/AVC Compressed Domain

Video abstraction allows searching, browsing and evaluating videos only by accessing the useful contents. Most of the studies are using pixel domain, which requires the decoding process and needs more time and process consuming than compressed domain video abstraction. In this paper, we present a new video abstraction method in H.264/AVC compressed domain, AVAIF. The method is based on the norm...

متن کامل

Temporal Generalization with Domain Generalization Graphs

This paper addresses the problem of using domain generalization graphs to generalize temporal data extracted from relational databases. A domain generalization graph associated with an attribute deenes a partial order which represents a set of generalization relations for the attribute. We propose formal speciications for domain generalization graphs associated with calendar (date and time) att...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Proceedings of the ... AAAI Conference on Artificial Intelligence

سال: 2023

ISSN: ['2159-5399', '2374-3468']

DOI: https://doi.org/10.1609/aaai.v37i12.26787